Non-probabilistic alignment of rare German and English nominal expressions

نویسنده

  • Bettina Schrader
چکیده

We present an alignment strategy that specifically deals with the correct alignment of rare German nominal compounds to their English multiword translations. It recognizes compounds and multiwords based on their character lengths and on their most frequent POSpatterns, and aligns them based on their length ratios. Our approach is designed on the basis of a data analysis on roughly 500 German hapax legomena, and as it does not use any frequency or co-occurrence information, it is well-suited to align rare compounds, but also achieves good results for more frequent expressions. Experiment results show that the strategy is able to correctly identify correct translations for 70% of the compound hapaxes in our data set. Additionally, we checked on 700 randomly chosen entries in the dictionary that was automatically generated by our alignment tool. Results of this experiment also indicate that our strategy works for non-hapaxes as well, including finding multiple correct translations for the same head compound.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Role of Phonotactics in the Segmentation of Native and Non- Native Continuous Speech

Previous research has shown that listeners make use of their knowledge of phonotactic constraints to segment speech into individual words. The present study investigates the influence of phonotactics when segmenting a non-native language. German and English listeners detected embedded English words in nonsense sequences. German listeners also had knowledge of English, but English listeners had ...

متن کامل

Normalizing German and English Inflectional Morphology to Improve Statistical Word Alignment

German has a richer system of inflectional morphology than English, which causes problems for current approaches to statistical word alignment. Using Giza++ as a reference implementation of the IBM Model 1, an HMMbased alignment and IBM Model 4, we measure the impact of normalizing inflectional morphology on German-English statistical word alignment. We demonstrate that normalizing inflectional...

متن کامل

Experiments with word alignment, normalization and clause reordering for SMT between English and German

This paper presents the LIU system for the WMT 2011 shared task for translation between German and English. For English– German we attempted to improve the translation tables with a combination of standard statistical word alignments and phrase-based word alignments. For German–English translation we tried to make the German text more similar to the English text by normalizing German morphology...

متن کامل

The Problem of English Spatial, Non-spatial and Idiomatic Adpositions in Iranian EFL Environment: A Prototypical Approach

Several studies of L2 learners’ interlanguage have addressed the complexity of the English adpositional system due to several reasons like L1 transfer, lack of knowledge in L2 and the strong collocational relations of prepositions with other elements of the English language. The major purpose of the present study is to evaluate the performance of Iranian students in dealing with three broad cat...

متن کامل

CimS - The CIS and IMS Joint Submission to WMT 2015 addressing morphological and syntactic differences in English to German SMT

We present the CimS submissions to the WMT 2015 Shared Task for the translation direction English to German. Similar to our previous submissions, all of our systems are aware of the complex nominal morphology of German. In this paper, we combine source-side reordering and target-side compound processing with basic morphological processing in order to obtain improved translation results. We also...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006